社会机器人的快速发展刺激了人类运动建模,解释和预测,主动碰撞,人类机器人相互作用和共享空间中共同损害的积极研究。现代方法的目标需要高质量的数据集进行培训和评估。但是,大多数可用数据集都遭受了不准确的跟踪数据或跟踪人员的不自然的脚本行为。本文试图通过在语义丰富的环境中提供运动捕获,眼睛凝视跟踪器和板载机器人传感器的高质量跟踪信息来填补这一空白。为了诱导记录参与者的自然行为,我们利用了松散的脚本化任务分配,这使参与者以自然而有目的的方式导航到动态的实验室环境。本文介绍的运动数据集设置了高质量的标准,因为使用语义信息可以增强现实和准确的数据,从而使新算法的开发不仅依赖于跟踪信息,而且还依赖于移动代理的上下文提示,还依赖于跟踪信息。静态和动态环境。
translated by 谷歌翻译
我们提出了一种基于深度多实例学习的简单高效的图像分类架构,并将其应用于牙科射线照片中龋齿检测的具有挑战性的任务。从技术上讲,我们的方法有两种方式贡献:首先,尽管使用弱图像级标签培训,它尽管培训了本地补丁分类概率的热线图。其次,它可以从分段标签学习,从而指导培训。与现有方法相比,人类用户可以忠实地解释预测并与模型进行交互以决定参加哪些区域。实验是在$ \ SIM $ 38K Bitewings($ \ SIM $ 316K牙齿)的大型临床数据集上进行的,在那里我们与各种基线相比实现了竞争性能。当由外部龋齿分割模型引导时,观察到分类和定位性能的显着改善。
translated by 谷歌翻译
Due to the environmental impacts caused by the construction industry, repurposing existing buildings and making them more energy-efficient has become a high-priority issue. However, a legitimate concern of land developers is associated with the buildings' state of conservation. For that reason, infrared thermography has been used as a powerful tool to characterize these buildings' state of conservation by detecting pathologies, such as cracks and humidity. Thermal cameras detect the radiation emitted by any material and translate it into temperature-color-coded images. Abnormal temperature changes may indicate the presence of pathologies, however, reading thermal images might not be quite simple. This research project aims to combine infrared thermography and machine learning (ML) to help stakeholders determine the viability of reusing existing buildings by identifying their pathologies and defects more efficiently and accurately. In this particular phase of this research project, we've used an image classification machine learning model of Convolutional Neural Networks (DCNN) to differentiate three levels of cracks in one particular building. The model's accuracy was compared between the MSX and thermal images acquired from two distinct thermal cameras and fused images (formed through multisource information) to test the influence of the input data and network on the detection results.
translated by 谷歌翻译
In the last decade, exponential data growth supplied machine learning-based algorithms' capacity and enabled their usage in daily-life activities. Additionally, such an improvement is partially explained due to the advent of deep learning techniques, i.e., stacks of simple architectures that end up in more complex models. Although both factors produce outstanding results, they also pose drawbacks regarding the learning process as training complex models over large datasets are expensive and time-consuming. Such a problem is even more evident when dealing with video analysis. Some works have considered transfer learning or domain adaptation, i.e., approaches that map the knowledge from one domain to another, to ease the training burden, yet most of them operate over individual or small blocks of frames. This paper proposes a novel approach to map the knowledge from action recognition to event recognition using an energy-based model, denoted as Spectral Deep Belief Network. Such a model can process all frames simultaneously, carrying spatial and temporal information through the learning process. The experimental results conducted over two public video dataset, the HMDB-51 and the UCF-101, depict the effectiveness of the proposed model and its reduced computational burden when compared to traditional energy-based models, such as Restricted Boltzmann Machines and Deep Belief Networks.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
在多语言甚至单语言中鉴定的模型的零拍跨语言能力刺激了许多假设,以解释这一有趣的经验结果。但是,由于预处理的成本,大多数研究都使用公共模型的公共模型,其预处理方法(例如代币化,语料库规模和计算预算的选择)可能会大不相同。当研究人员对自己的模型预识时,他们通常会在预算有限的情况下这样做,并且与SOTA模型相比,最终的模型的表现可能明显不足。这些实验差异导致有关这些模型跨语性能力的性质的各种不一致的结论。为了帮助对该主题进行进一步研究,我们发布了10个单语字节级模型,并在相同的配置下进行了严格审慎的概述,并具有大型计算预算(相当于V100的420天)和Corpora,比原始BERT大4倍。由于它们不含令牌,因此消除了看不见的令牌嵌入的问题,从而使研究人员可以在具有不同脚本的语言中尝试更广泛的跨语言实验。此外,我们释放了在不自然语言文本上预测的两个模型,这些模型可用于理智检查实验。关于质量检查和NLI任务的实验表明,我们的单语模型实现了多语言的竞争性能,因此可以加强我们对语言模型中跨语性可传递性的理解。
translated by 谷歌翻译
健壮的学习是科学机器学习(SCIML)的重要问题。文献中有几篇关于该主题的作品。但是,对方法的需求不断增加,可以同时考虑SCIML模型识别中涉及的所有不同不确定性组成部分。因此,这项工作提出了一种对SCIML的不确定性评估的综合方法,该方法还考虑了识别过程中涉及的几种不确定性来源。提出的方法中考虑的不确定性是缺乏理论和因果模型,对数据腐败或不完美的敏感性以及计算工作。因此,可以为SCIML领域中的不确定性感知模型提供总体策略。该方法通过案例研究验证,开发了用于聚合反应器的软传感器。结果表明,已识别的软传感器对于不确定性是可靠的,并以所提出的方法的一致性证实。
translated by 谷歌翻译
对心脏周围环境的脂肪库的定量是评估与多种疾病相关的健康风险因素的准确程序。但是,由于人为的工作量,这种类型的评估并未在临床实践中广泛使用。这项工作提出了一种用于自动分割心脏脂肪垫的新技术。该技术基于将分类算法应用于心脏CT图像的分割。此外,我们广泛评估了几种算法在此任务上的性能,并讨论了提供了更好的预测模型。实验结果表明,心外膜和纵隔脂肪分类的平均准确性为98.4%,平均正面速率为96.2%。平均而言,关于分割的患者和地面真相的骰子相似性指数等于96.8%。因此,迄今为止,我们的技术已经获得了心脏脂肪自动分割的最准确结果。
translated by 谷歌翻译
自动面部识别是一个知名的研究领域。在该领域的最后三十年的深入研究中,已经提出了许多不同的面部识别算法。随着深度学习的普及及其解决各种不同问题的能力,面部识别研究人员集中精力在此范式下创建更好的模型。从2015年开始,最先进的面部识别就植根于深度学习模型。尽管有大规模和多样化的数据集可用于评估面部识别算法的性能,但许多现代数据集仅结合了影响面部识别的不同因素,例如面部姿势,遮挡,照明,面部表情和图像质量。当算法在这些数据集上产生错误时,尚不清楚哪些因素导致了此错误,因此,没有指导需要多个方向进行更多的研究。这项工作是我们以前在2014年开发的作品的后续作品,最终于2016年发表,显示了各种面部方面对面部识别算法的影响。通过将当前的最新技术与过去的最佳系统进行比较,我们证明了在强烈的遮挡下,某些类型的照明和强烈表达的面孔是深入学习算法所掌握的问题,而具有低分辨率图像的识别,极端的姿势变化和开放式识别仍然是一个开放的问题。为了证明这一点,我们使用六个不同的数据集和五种不同的面部识别算法以开源和可重现的方式运行一系列实验。我们提供了运行所有实验的源代码,这很容易扩展,因此在我们的评估中利用自己的深网只有几分钟的路程。
translated by 谷歌翻译
当使用基于视觉的方法对被占用和空的空地之间的单个停车位进行分类时,人类专家通常需要注释位置,并标记包含目标停车场中收集的图像的训练集,以微调系统。我们建议研究三种注释类型(多边形,边界框和固定尺寸的正方形),提供停车位的不同数据表示。理由是阐明手工艺注释精度和模型性能之间的最佳权衡。我们还调查了在目标停车场微调预训练型号所需的带注释的停车位数。使用PKLOT数据集使用的实验表明,使用低精度注释(例如固定尺寸的正方形),可以将模型用少于1,000个标记的样品微调到目标停车场。
translated by 谷歌翻译